Major Wrestling Championship Data - Exploratory Data Analysis

Import Libaries

Import our first table of data

Let's see how it looks as a dataframe

Let's take a look at the columns

Let's rename the columns, remove the extra column and row at the top , then change the datatype for some of our columns.

Let's check how it looks now

All good so far, let's sort it by how many times eacdh person was champion, then sort by how long they were champion

Our duration is a string type, which is letters and numbers. We want an int, or a float , but in order to get that we need to remove the text and any other characters.

Okay, lets see how it looks so far. We attempted to split "days" from the Duration column. I expect for the word "days" to be gone.

"Days " is gone but day is still there. we also have a row entries where the person was champion less than one day. Let's set them all to 1. Then, we can change the data type. This will be helpful for us later on when we decide to add more data into the project.

Let's check it out again and see if it looks okay

No day, days or < in our way. All looks good

Let's add in our second table .

First we had WWE Champsionship data, now we have AEW Championship data. We need to do all of the same stuff that we did to our first table and then check it .

That seems to be okay. The table is smaller as they haven't had too much time with this championship.

Lets import our 3rd table.

TNA Impract Wrestling's data

Same issues , so lets do the same thing to clean this table.

Looks good. Let's add in the next table. WWE's Universal Championship

Same thing, let's clean the table

After taking a peak at the data we can't convert the duration column to an int yet.

After setting it to one like we did bfore. I realized Bray Wyatt was using a different name. Let's change his name back to Bray Wyatt

Now that we've completed that, lets merge our two WWE tables together

Looks like th merge is successful. Let in a 0 where anything is empty

Let's see if that worked.

Good! Let's leave this here for now and come back to it. We have some more table to add.

Let's import our ROH data and clean it like we did the other tables.

Let's move to the next table

WCW data

I had to comment out trying to convert the datatype as someone else only was champion for a day.

I think most people know Hulk Hogan as that and not Hollywood Hogan. Let's change his name to Hulk Hogan for everything . I'm sure this will porbably be the only place it says soemthing other than Hulk Hogan due to him starting the NWO and becoming a villian. Let's change Sid Vicious and Psycho Sid to just Sid. We'll change him now in this table.

Yup, I was right to change Sid, but I missed that Kevin Nash wrestled under the name Diesel so let's change that too.

Now lets remerge the tables we have for WCW and WWE so far. Let's have them merge on the champion column and we can use that later to get the total reigns and duration for each wrestler.

Lets get rid of any empty values.

let's see what all columns are avilable now that we merged these dataframes.

Let'ssee the datatype for each column

Lets get total days as champion and total reigns for each wrestler.

Lets take a look and make sure it looks

Looks good, now lets sort it like we did our other table earlier

Let's pull the top 20 wrestlers from our merged dataframe.

Let's place it on a scatter plot so that we have a good visual of what's going on .

Most of our champions are down in the botton left corner which means that they had four or less times as champion and they were a champion less than 500 days . You can see how some names such a John Cena , Brock Lesnar or Hulk Hogan stand out.

Let's add in more data

Let's pull in the ECW TV title and treat it a a heavyweight title. Look at the champion list in hinds sight. We'll also pull in the normal heavyweight title later

Same as earlier with someone not being able to retain thier championship for more than a day. Let's add in 4 more tables

We have to get the ECW Heavyweight plus the WWE version of it . We also ahve to get the WWE version of carrying on the WCW championship, and we need NWA's championship data.

That was all very similar as earlier. Let's merge all of our ECW data

Let's remove the empty values and replace them with 0s

Now lets get the total days and total amount of times each ECW wrestler was champion.

Let's look at our champion list

Let's change Tazz and whatever other name to just Taz

Now lets remerge our data

Ok, Taz is good, its all good , lets grab our WWE World Heavyweight title. The one from WCW...

King Booker has to change to Booket T .

I'm not sure if Mick Foley will pop up else where but we need the data from Mankind to be merged so lets change Mankind to Mick Foley

Now lets get our WCW/WWE data from earlierand add this heavyweight championship to it .

Ok now lets get our totals

Looks good, now lets check out Impact data again.

I see Mick Foley here but I also see other names from WWE

Those are good. Let's check our NWA data.

I see more wrestlers from WWE let's fix it so everything merges.

3 more names to fix

let's merge our Impact, ROH and NWA data.

Now lets mege everything together.

I'll have to fix this before there's a future python version but we're ok for now . Lets look at our columns.

Let's filter this down to the 3 columns that we actually need

Lets sort the data by amount of times champion and how long they were champion

Let's check the first 20 rows

Let's see a visual of it, but lets get the operation of least squares . We're going to get a formula for how long on average a wrestler is champion based on this data.

We used Plotly's OLS for that, Let's see if we get the same thing if we setup a Linear Regression model. Let's define X and Y

Let's fit the data and plot it , similar to what we did with Plotly

That looks good. Lets take another look at our data

Our avg champ is a 3x champ for about 400 days. Let's visualize our data in a different way. It will give you a different perspective on our multi time champions and how rare they are

A 10x champ or more is pretty rare

Let's compare our Linear Regression Model from all the data to just the WWEWCW data. Our equation here was 84.6 x the amount of times as champion + 107 days. With all of the data our equation was 105.9 times the amount of times as champion plus 99.

Its not exactly the same but its kind of similar . If you notice the older generation wrestlers are the ones far above our linear regression line on both charts. Today's wrestlers and alot of the fan favories are towards the bottom right where they had more reigns but a shorter duration of time as a champion .